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Abstract 

In this paper we show that payment computation essentially does not present any obstacle in de- 
signing truthful mechanisms, even for multi-parameter domains, and even when we can only call the 
allocation rule once. We present a general reduction that takes any allocation rule which satisfies "cyclic 
monotonicity" (a known necessary and sufficient condition for truthfulness) and converts it to a truthful 
mechanism using a single call to the allocation rule, with arbitrarily small loss to the expected social 
welfare. 

A prominent example for a multi-parameter setting in which an allocation rule can only be called 
once arises in sponsored search auctions. These are multi-parameter domains when each advertiser has 
multiple possible ads he may display, each with a different value per click. Moreover, the mechanism 
typically does not have complete knowledge of the click-realization or the click-through rates (CTRs); 
it can only call the allocation rule a single time and observe the click information for ads that were pre- 
sented. On the negative side, we show that an allocation that is truthful for any realization essentially 
cannot depend on the bids, and hence cannot do better than random selection for one agent. We then 
consider a relaxed requirement of truthfulness, only in expectation over the CTRs. Even for that relaxed 
version, making any progress is challenging as standard techniques for construction of truthful mecha- 
nisms (as using VCG or an MIDR allocation rule) cannot be used in this setting. We design an allocation 
rule with non-trivial performance and directly prove it is cyclic-monotone, and thus it can be used to 
create a truthful mechanism using our general reduction. 
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1 Introduction 



In this paper we show that payment computation essentially does not present any obstacle in designing 
truthful mechanisms, even for mul ti-parameter domains, and even when we can only call the allocation rule 
once. This extends the result of (|Babaioff et all 120101) for single parameter domains to multi-parameter 
domains. We present a general reduction that takes any allocation rule which satisfies "cyclic monotonicity" 
(a known necessary and sufficient condition for truthfulness) and convert it to a truthful mechanism using a 
single call to the allocation rule, with arbitrarily small loss to the expected social welfare. The mechanism 
does not compute the payments explicitly but rather charges random payments having the right expectation. 

Such a reduction is particularly attractive as it can handle multi-parameter settings where it is impossible 
to decouple the computation of the allocation from the actual execution of the allocation. In such situations, 
the entire mechanism — including the payment computation — can only execute a single call to the allo- 
cation rule. We call this the "no-simulation" constraint; it can arise when a mechanism interacts with the 
environment, and the information revealed by the environment depends on the choices made by the alloca- 
tion rule. The no-simulation constraint is a significant hurdle because the existing approaches to payment 
computation require multiple calls to the allocation rule, with different vectors of bids. 

Sponsored search auctions supply a prominent example of a multi-parameter setting with the no-simulation 
constraint. In this setting each advertiser has multiple possible ads he is interested in displaying, each with 
a different value per click, and the mechanism does not have complete knowledge of the click-realization or 
the click-through rates (CTRs). Instead, it can only allocate ad impressions and observe the click informa- 
tion for ads that were presente d. The no-simulation constraint also arises in other contexts, such as packet 
routing (|Shnayder et all l2012[) . 

We note that our reduction — the multi-parameter transformation — has other uses beyond settings 
with the no-simulation constraint. For example, it can also be used to speed up the computation of payments 
in mos t multi-parameter mechanisms. Indeed, it has already been used for this purpose by two recent 
papers. Ijain et all (1201 ll) used it to speed up the p ayment computation for a mechanism that allocates batch 
jobs in a cloud system. Huang and KannarT l 20121 ) used it to compute payments for their privacy-preserving 
procurement auct ion for spanning trees, which is based on the well-known "exponential privacy mechanism" 
from prior work (jMcSherry and Taiwan |2007|) . 



Sponsored search mechanisms with unknown CTRs. In the remainder of the paper we focus on the 
problem of designing truthful mechanisms for an archetypical multi-parameter setting with the no-simulation 
constraint: sponsored search auctions with unknown click-through rates (CTRs). The difficulty in designing 
such allocation rules stems from the fact that the welfare of a given allocation depends on clicks of the allo- 
cated ads, which are unknown to the bidders and to the mechanism. This prevents us from using the VCG 
mechanism since it depend on choosing a welfare-maximizing allocation. Yet, it is possible that welfare can 
at least be approximated. 

Mechanisms that are truthful for every realization of the clicks would be most attractive, as the strategic 
behavior in such mechanisms would not depend on the agents' beliefs about the process generating the clicks 
— for exampl e, the belief tha t click s for e ach ad are i.i.d. from a fixed distribution. Such mechanisms were 
constructed in Babaioff et al. (I2009L l2Q10h for the single-parameter version of the problem. Unfortunately, 
the multi-parameter setting is much harder. In the setting of sponsored search with multiple ads per bidder 
and unknown CTRs, we show that if the mechanism is required to be truthful for every realization of the 
clicks, then it must be a trivial mechanism that outputs a fixed allocation (or distribution over allocations) 
with no dependence on the bids. 
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In light of this negative result we consider a weaker notion of truthfulness. Assume that clicks are 
stochastic (meaning that each ad has a CTR, and clicks are independent Bernoulli trials with the specified 
click probabilities) but the CTRs are not known. The mechanism is required to be truthful for every vector 
of CTRs; we call mechanisms with this property stochastically truthful. The VCG mechanism still cannot 
be used as we cannot maximize the expected welfare without knowing the CTRs. An alternative is to use 
a maximal-in-distributional-range (MIDR) allocation rule combined with VCG-based payment rule, but we 
show that for a natural family of MIDR allocation rules (in which the set of distributions the rule optimizes 
over is independent of the CTRs) the performance of such rules is no better than randomly selecting an ad 
to present. 

There are a few examples in the literature of non- VCG-based truthful multi-parameter m echanisms in 



which bidders freely choose an option from a hand-crafted menu of allocations and prices, e.g. (|Bartal et al 



20031 : iDobzinski et all. 120061 : iDobzinski and Nisanl . 1201 ll) . but this technique similarly fails in our setting 
because the bidders do not have a dominant strategy for choosing from such a menu when they do not know 
their own CTRs. 

Given all these negative results we turn to our multi -parameter transformation which reduces the problem 
of designing truthful randomized mechanisms to the (seemingly simpler) problem of designing cyclically 
monotone (CMON) allocation rules. In contrast to the negative result for truthfulness for every realization, 
we directly craft an allocation rule tha t satisfies stochast i c CMO N; to our knowledge, the only previous paper 
to successful apply this approach is (|Lavi and Swamyl 120071) . Using the transformation we construct a 
stochastically truthful mechanism that outperforms the naive random allocation for a single agent, when the 
difference in value-per-impression of his ads is sufficiently large. While this is clearly just a small step, it 
proves to be rather challenging, and relies heavily on the multi-parameter transformation described above. 



Related work. Our earlier paper (JBabaioff et all 1201 0|) considers the limited case of single parameter 
domains. It introduced the technique of designing black-box transformations that perform implicit pay- 
ment computation while evaluating a given monotone allocation function only once. The same paper intro- 
duced monotone allocation rules with strong welfare guarantees for sponsored search auctions with unknown 
CTRs, by modifying multi-armed bandit algorithms to achieve the requisite monotonicity property. As all 
the results in our earlier paper are limited to single-parameter settings, they only apply to sponsored search 
when each advertiser has only one ad to display. In the present paper, we show that the black-box transfor- 
mation extends readily from single-parameter to multi-parameter settings, whereas extending the results on 
sponsored search to multi-parameter settings is much more delicate, and in some cases (i.e. for the strongest 
not ion of truthfulness) outrigh t impossible. 



Wilkens and Sivan ( 2012 ) extended the results of ( Babaioff et al. , 2010h to multi -parameter domains 



under some limitations. Their work provides a black-box transformation that allows implicit payment com- 
putation when the allocation function is maximal-in-distributional-range (MIDR). While the MIDR property 
is the most widely used method for achieving truthfulness in multi-parameter settings, it is not a necessary 
condition for truthfulness. In fact several papers (including this one) depend on multi-parameter mecha- 
nisms that are not MIDR. By presenting an implicit payment computation procedure that works whenever 
there exists a truthful mechanism utilizing the given allocation function, we believe that we have posed the 
multi-parameter transformation at the appropriate level of generality for future applications. 

The literature contains surprisingly few examples of trut hful multi-pa r amete r mechanisms that are no t 
based on MIDR allocation r ules. Mechanisms designed by iBartal et al.l (|2003r) : IDobzinski et all ((20061) : 
Dobzinski and Nisar] ( 2011 ) for various combinatorial auction domains make use of what might be termed 
the pricing technique: each agent is allowed to choose freely from a menu of alternatives, each specifying 
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an allocation and price. The menu presented to a given agent may depend on the others' bids, but it must 
be carefully constructed so that self-interested agents each choosing from their own menu will never jointly 
select an infeasible allocation. The taxation principle (IGuesnerid. 1 198 ll : lHammondl 1 1 9791) implies that every 
dominant-strategy truthful mechanism can actually be represented this way, provided that agents are able 
to evaluate their own utilities for different allocations before the allocation is actually executed. In settings 
with the no-simulation constraint, the taxation principle does not apply because agents can only evaluate 
their utility ex post. In the sponsored search setting, for example, agents have no dominant strategy for 
choosing from a menu listing bundles of ad impressions, because without knowing CTRs they can't pre- 
cisely determine the value of an impression; on the other hand, the mechanism is powerless to offer a menu 
listing bundles of clicks, because there is no way to guarantee that a bidder who chooses a certain bundle 
will receive the specified number of clicks. 

Apart from mechanisms with MIDR allocation rules and those based on the pricing technique, we are 
aware of only one other mechanism in the literature that is dom inant-strategy truthful in a multi-parameter 
setting: the scheduling mechanism of lLavi and Swamyl (120071) for unrelated machines that have only two 
possible processing times. Their mechanism, like ours, is designed by directly constructing an allocation 
function that satisfies the cyclic monotonicity constraints. 



2 Preliminaries 



We study reductions from allocations to truthful mechanisms f or multi-parameter domains. A CS -oriented 
background on multi -parameter mechanisms can be found in Archer and Kleinberg d2008bllah . while an 
Economics-oriented background can be found in lAshlagi et all (l2010h . Our main result holds for a very 
general framework for multi-parameter mechanisms, described below, where agents' types are defined as 
mappings from outcomes to valuations. Our reduction invokes the allocation rule only once, which make it 
particularly useful in domains in which the allocation rule cannot be invoked (or simulated) more than once 
due to informational constraints. 



Types, outcomes, and mechanisms. Multi-parameter mechanisms are defined as follows. There are n 
agents and a set O of outcomes. Each agent i is characterized by his type Xj : O — > K, where Xj(o) is 
interpreted as the agent's valuation for the outcome o G O. For each agent i there is a set of feasible types, 
denoted %. Denote T = 7i x . . . x T n and call it the type space; call T% the type space of agent i. The 
mechanism knows (n,G,T), but not the actual types x»; each type is known only to the corresponding 
agent i. Formally, a problem instance, also called a multi-parameter domain, is a tuple (n, O, T). 

A (direct revelation) mechanism M. consists of the pair (A, V), where A : T — > O is the allocation rule 
and V : T — > 5t n is the payment rule. Both A and V can be randomized, possibly with a common random 
seed. Each agent i reports a type bj 6 % to the mechanism, which is called the bid of this agent. We denote 
the vector of bids by b = (bi , . . . , b n ) G T ■ The mechanism receives the bid vector b G T, selects an 
outcome .4(b), and charges each agent i a payment of Vi(h). The utilities are quasi-linear and agents are 
risk-neutral: if agent i has type jq G % and the bid vector is b G T, then this agent's utility is 

u;(x i; b) = E M [xi(.4(b)) - Pi(b) ] . (1) 

For each type x, G 71 of agent i we use a standard notation (b_j,Xj) to denote the bid vector b such 
that hi = Xj and hj = bj for every agent j ^ i. 
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Game-theoretic properties. A mechanism is truthful if for every agent i truthful bidding is a dominant 
strategy: 

Ui(xi; (b_j,Xj)) > Ui(xi;b) Vxj G %, b G T. (2) 

An allocation rule is called truthfully implementable if it is the allocation rule in some truthful mechanism. 

A mechanism is individually rational (IR) if each agent i never receives negative utility by participating 
in the mechanism and bidding truthfully: 

Ui(xi; (b_i,Xi)) > Vxj G T, b„j G 71*. (3) 

The right-hand side in Equation © represents the maximal guaranteed utility of an "outside option" 
(i.e., from not participating in the mechanism). For example, our definition of IR is meaningful whenever 
this utility is 0, which is a typical assumption for most multi-parameter domains studied in the literature. 

Note that if the mechanism is randomized, the above properties are defined in expectation over the 
internal random seed. We can also define utility (and, accordingly, truthfulness and IR) for a given realization 
of the random seed. We say a mechanism is universally truthful if it is truthful for all realizations of the 
random seed; similarly for IR and other properties. 

Our assumptions. We make two assumptions on the type space T: 

• non-negative types: Xj(o) > for each agent i, type x,; G T, each outcome o G O. 

• rescalable types: Axj G T for each agent i, type x* G T, and any parameter A G [0, 1]. (Ax.; denotes 
the type x^ whose valuation for eveiy outcome o satisfies x'^o) = Axj(o).) 

In particular, for each agent i there exists a zero type: a type Xj G T such that x/(-) = 0. Let us say 
that a mechanism is normalized if for each agent i, the expected payment of this agent is whenever she 
submits the zero type. 

For domains with non-negative types, it is desirable that all agents are charged a non-negative amount; 
this is known as the no-positive-transfers property. 

Dot-product valuations. An important special case is dot-product valuations, where the type x G T of 
each agent i can be decomposed as a dot product x(o) = /3 X • a,i(o), for each outcome o G O, where 
/3 x ,aj(o) G 5R d are some finite-dimensional vectors. Here the term Oj(o) is the same for all types x G T 
(and known to the mechanism), whereas (3 X is the same for all outcomes o G O and is known only to agent 
i. The term aj(o) is usually called an "allocation" of agent i for outcome o, and /3 X is called the "private 
value". Single-parameter domains correspond to the case d = 1. 

Note that the type x of each agent i is determined by the corresponding private value /3 X , and his type 
space T is determined by Di = {/3 X : x G T} C $l d . Because of this, in the literature on dot-product 
valuations the term "type" often refers to /3 X . To avoid ambiguity, in this section we will refer to /3 X as 
"private value" rather than "type", and call D\ x . . . x D n the private value space. 

In a domain with dot-product valuations, types are rescalable if and only if 

/3 X g A =>- A/3 X G Di for each A G [0, 1]. 

In words, assuming rescalable types is equivalent to assuming that the set Dj is star-convex at 0. To ensure 
non-negative types, it suffices to assume that Di C 3^ for each agent i, and all allocations are non-negative: 
fli(o) G 54 for all oG O. 
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Truthfulness characterization. We will use a characterization of truthful mechanisms via a property 
called "cycle-monotonicity". 

A (randomized) allocation rule A is cycle-monotone (henceforth abbreviated as CMON) if the following 
property holds: for each bid vector b G T, each agent i, each k > 2, and each A;-tuple x^o, x^i , . . . , Xj^ E 
Ti of this agent's types, we have 



E 



.4 



3=0 



^■i, (j— 1) mod k 



> 0, where o ; 



.4(b_ 



G 0. 



(4) 



Recall that we are using a general notion of agents' types (and bids), which are denned as functions from 
outcomes to real-valued valuations. 

It is known that A is truthfully implementable if and only if it is cycle-monotone, in which case the 
corresponding payment rule is essentially fixed. 



Theorem 2.1 (|Rochetl (| 19871) ). Consider an arbitrary multi-parameter domain (n, O, T). A (randomized) 
allocation rule A is truthfully implementable if and only if it is cycle-monotone. Assuming rescalable types, 
for any cycle-monotone allocation rule A, a mechanism (A, V) is truthful and normalized if and only if 



hi(A(h))- f bi(^(b_ i? thi))dt 
Jt=o 



(5) 



This characterization generalizes a w ell-known truthfulness characterizat i on of single-parameter mech- 
anisms in terms of monotonicity, due to (IMyersonl. Il98ll : lArcher and Tardosl 120011) . Recall that for single- 
parameter domains, the type of each agent i is captured by a single number (the private value vi), and the 
outcome pertinent to this agent is also captured by a single number (this agent's allocation aj(o)). The bid 
of agent i is represented by fej £ K. Cycle-monotonicity is then equivalent to a much simpler property called 
monotonicity: for each agent, fixing the bids of other agents, increasing this agent's bid cannot decrease this 
agent's allocation. The payment formula © can also be simplified, e.g. for non-negative valuations it is 



Vi(b) = biAiib-iA) ~ Ji, Mb- h u)du. 



(6) 



External seed. We allow allocation rules to receive input from the environment; a canonical example 
is pay-per-click auctions where such input consists of user clicks. Formally, the allocation rule and the 
payment rule depend on the additional argument to which captures all relevant input from the environment. 
(To simplify the notation, we keep the dependence on u implicit.) We call ui the external seed, to distinguish 
from the internal random seed of the mechanism. We assume that uj is an independent sample from some 
fixed distribution V ext ; this distribution may be unknown to the mechanism. 

All game-theoretic properties denned above cany over to mechanisms with external seed if all expec- 
tations are over both internal and external seed. In particular, Theorem 12.11 carries over with no other 
modification. 

We are primarily interested in properties that hold in expectation over the external seed, for all possible 
distributions V ext over the external seed. The corresponding version of a given property P is called stochas- 
tically P. For example, we are interested in mechanisms that are stochastically truthful, and this requires 
the allocation rules to be stochastically CMON. 

We also define a stronger version of truthfulness: one that holds for each realization of the external 
seed. For each game-theoretic property P described above, such as truthfulness, IR and CMON, a version that 
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holds for each realization of the external seed will be called ex-post P. Theorem 12. 1 1 holds for every given 
realization of the external seed (but requires the allocation rule to satisfy ex-post CMDN). 

A crucial way in which the external seed is different from the internal randomness is that a given run 
of the allocation rule might not observe the entire external seed. More precisely, runs of the allocation rule 
on different bid vectors might observe different portions of the external seed. For example, if an ad is not 
displayed to a given user, the mechanism does not observe whether this user would have clicked on this ad if 
it were displayed. It follows that the mechanism might not be able to simulate the allocation rule on different 
bid vectors - this is precisely the "no-simulation" constraint discussed in the Introduction. Moreover, this 
issue can affect payment computation: the payment prescribed by Equation ©, although well-defined as a 
mathematical expression, might not be computable given the information available to the mechanismQ 

To address this issue formally, we say that the mechanism is information-feasible if for each run of of the 
mechanism (i.e., for each bid vector b, each realization of the mechanism's internal randomness, and every 
possible value of the external seed) the payments are uniquely determined given the information available 
to the mechanism. 



Implicit payment computation for single-parameter domains. iBabaioff et al. provid 



le an im- 
plicit payment computation result for single-parameter domains. They prove that any monotone allocation 
rule for any single-parameter domain can be transformed into a truthful, information-feasible mechanism 
with an arbitrarily small loss in expected welfare. The allocation rule is only invoked once. Below we quote 
a special case of this result that is most relevant to the present paper (restating it slightly to make it consistent 
with our notation.) 

Let A and A be two single-parameter allocation rules for the same domain. Let D be a distribution on 
[0, 1]. Allocation rule A is called bid-resampling with respect to A, with resampling distribution D, if A 
proceeds as follows: for each agent i, the submitted bid is multiplied by some Xi £ [0, 1], where Xi is an 
independent draw from distribution D, and the original allocation rule A is called with the modified bid 
vector. 



Theorem 2.2 (IBabaioff et all d20ldV ). Consider an arbitrary single-parameter domain where the private 



values of each agent lie in the interval [0, 1]. Let Abe a stochastically monotone allocation rule for this 
domain. Then for each 5 € (0, 1) there exists an information-feasible mechanism Ms = {A,V) with the 
following properties. 

(a) [Structure] Allocation rule A is bid-resampling with respect to A with resampling distribution that 
depends on 5 but not on A. 

(a) [Incentives] Ms ' s stochastically truthful, universally ex-post individually rational. If A is ex-post 
monotone, then Ms is ex-post truthful. 

(c) [Performance] For n agents and any bid vector b (and any fixed external seed) allocations A(b) and 
A(b) are identical with probability at least 1 — n8. Moreover, if A is a-approximate (for social 
welfare), then mechanism Ms is a /(I — -^^-approximate. 

(d) [Payments] Ms is ex-post no-positive-transfers; and although it is not universally so, for all realiza- 
tions of the internal seed it never pays any agent i more than b{ ■ Ai(x) ■ ($ — 1). Ms is universally 
ex-post normalized. 



This has been proved in ( IBabaioff et all l2009l ; lDevanur and KakadeL 120091) in the context of multi-armed bandit mechanisms, 
see Section|4]for more details. 
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T o make our expositio n more self-contained, we now repeat the construction of the mechanism A4$ 
from dBabaioff et all 1201 oh fl 



1 . Collect bid vector b. 

2. Independently for each i £ [n], randomly sample \i = 1 with probability 1 — 5 and otherwise 
\i = 7* , where 7$ £ [0, 1] is sampled uniformly at random. 

3. Construct the vector of modified bids, x = (xi&l; ■ ■ ■ 1 Xnb n )- 

4. Allocate according to A(b) = A(x). 

5. Compute payments using the formula 

v i {b)=b i .A{x)-{\ 1 i t xi= \ 

ifxi<l- 

Note that A is indeed bid-resampling with respect to A, with a specific resampling distribution that is defined 
at line (2). 



3 The multi-parameter transformation 

In this section we present our first main contribution: the implicit payment computation result for multi- 
parameter domains. For a given multi-parameter domain and a given CMDN allocation rule for this domainJl 
our goal is to design a truthful, information-feasible mechanism with outcome that is almost always identical 
to that of the original allocation rule, and this, in particular, ensures a small loss in expected welfare. We 
achieve this goal for every CMON allocation rule and every multi-parameter domain (under a mild assump- 
tion of rescalable, non-negative types). More precisely, we give a general "multi-parameter transformation" 
which takes an arbitrary CMDN allocation rule A and transforms it into a truthful, information-feasible mech- 
anism which implements the same outcome as A with probability arbitrarily close to 1. This mechanism 
requires evaluating A only once; its allocation rule randomly modifies the submitted bids, and then calls 
A on the modified bids. The technical contribution here is showing that the natural generalization of the 
reduction for the single-parameter setting, to the multi-parameter setting, preserves all desired properties. 
The non-trivial part of the proof is showing that although the single-parameter transformation only ensures 
that each agent does not have an incentive to deviate by scaling all his bids by the same scalar in [0, 1], he 
also does not have an incentive to deviate to any other arbitrary bids. 

The transformation. Our multi-parameter transformation is a remarkably straightforward generalization 
of the single-parameter transformation specified in Section |2] In fact, there is no need to rewrite the five 
steps; the only thing that changes is the inteipretation of the notation. Specifically, the bids bi,...,b n 
should now be interpreted as elements of the type spaces 71, . . . , T n rather than as scalars, and for each i the 
modified bid xA is obtained by multiplying the abstract type 6j (a function from outcomes to reals) by the 
random scalar %%• (Note that Xih is well-defined because we are assuming the rescalable types property.) 

2 The paper states the construction more abs tractly, in terms of a g eneral self-resampling procedure. The simple description of 
Ms that we present here was first published in dShnavder et all |201 2h . 
3 Recall that CMON is a necessary and sufficient condition for truthfulness. 
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In the remainder of this section we analyze the properties of the multi-parameter transformation, proving 
an analogue of Theorem 12.21 The subtlest step, which occupies most of the analysis, is to prove that the 
modified allocation rule A satisfies CMON. 

Induced single-parameter domains. To aid in the analysis, it will be helpful to introduce the following 
notation. Consider a bid vector b G T and a vector of "rescaling coefficients" A G [0, l] n . Denote 

A <g> b = (Aibi , . . . , A n b n ) G T. 

In other words, A <g> b is the rescaled bid vector where the bid of each agent i is Ajbj. Note that for each b 
the subset 

T b = {A®6: AG [0,1]"} CT 

forms a single-parameter type space where each agent i has private value Aj G [0, 1] and allocation fej(o) 
for every outcome o. By abuse of notation, let us treat the allocation and payment rules for 7b as functions 
from the private value space [0, l] n rather than the type space Tt,- 

We want to prove that the mechanism Ms = {A, V) defined by our transformation is truthful. As a 
starting observation, note that when one applies the single-parameter transformation given in Section |2]to 
the allocation rule defined by .Ab(A) = A{\®h), one obtains a mechanism that coincides with the restriction 
of M s to 7b- By Theorem 12.21 we may conclude that the restriction of M s to the single-parameter type 
space 7b is truthful. Yet this conclusion is not sufficient, since this truthfulness condition is actually weaker 
than what we are aiming for: it ensures that a deviation inside the single-parameter type space 7b is not 
beneficial, but says nothing about deviation to other types in T \ 7b- Nevertheless, our proof will show that 
if the original allocation rule was CMON, the transformed allocation rule is also CMON for the domain 7", and 
thus is truthful as needed. 

Theorem 3.1. Consider an arbitrary multi-parameter domain (n,0,T) with rescalable, non-negative 
types. Let Abe a stochastically CMON allocation rule for this domain. Let Ms = (A,V) be the trans- 
formed mechanism for some parameter 5 G (0, 1). Then Ms has the following properties: 

(a) [Structure] Ms is information-feasible. 

(b) [Incentives] Ms is stochastically truthful and universally ex-post individually rational. If A is ex-post 
CMON, then M is ex-post truthful. 

(c) [Performance] For n agents and any bid vector b (and any fixed external seed) allocations A(b) and 
A(b) are identical with probability at least 1 — n5. Moreover, if A is a- approximation to the maximal 

social welfare then A is a/ ^1 — j^jj -approximation to the maximal social welfare. 

(d) [Payments] M is ex-post no-positive-transfers; and although it is not universally so, for all realiza- 
tions of the internal seed it never pays any agent i more than bj(o)(^ — 1), where o = A(h) G O. 
Additionally, M is universally ex-post normalized. 

Proof. Ms is information-feasible by construction, since so are the single-parameter mechanisms obtained 
from Theorem |2.2| All claimed properties except truthfulness follow immediately from Theorem |2.2| Below 
we prove truthfulness. 
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We claim that A satisfies CMDN. Indeed, fix bid vector b G T, agent i, some k > 2, and a fc-tuple 
x i,o> x i,i 5 • • • > x i,fc G 71 of this agent's types. Let us consider a fixed realization of the random vector 
X G [0, l] n in step Q of mechanism Ms- For each type xy, note that we have 

A(xij,b-i) =A(x®(*ij, b_i)) g 0. 
Denote this outcome by Ojj(x). Let us apply the cycle-monotonicity of A for bid vector \ ® ( x i,j > b_j): 



E 



,4 



EJLq x i,i(°*,i(x)) - x i, (j-i) mod fc(oy(x)) 



> 0. 



(V) 



Recalling that Ojj(x) = Afej, b_j), we observe that for this fixed realization of x, Equation © is exactly 
the inequality in the definition of cycle-monotonicity for A. Therefore taking expectation over \, we obtain 
the desired inequality Equation dU) for A. Claim provedQ 

It remains to prove that in the transformed mechanism (A, V), the payment rule satisfies Equation (j5). 
Fix bid vector b and consider the transformed single-parameter mechanism (Ay,, Vy,) for the single-parameter 
type space 7b- In the terminology of single-parameter domains, each agent i receives an allocation Ah, i (A) = 
bi(Ab(ty) whenever the bid vector is A G [0, l] n . Since this is a truthful and normalized single-parameter 
mechanism, it follows that 



E 



E 



Aj -4b, i (A) 



Ab i(X-i,t) dt 



va g [o, iy 



Plugging in A = 1 and using the definitions of Ah, Vh, we obtain the desired Equation ©. 



□ 



4 Multi-parameter MAB mechanisms 



Let u s define a natural multi - parameter extension to the M AB mechanism design problem studied in (B abaioff et al. 



2009; Devanur and Kakadd . 



200l : lBabaioff et alll2010h PI 



Problem formulation. There are n agents. For each agent there is a known and fixed set of ads he is 
interested in, and without loss of generality these sets are disjoint. The total number of ads is denoted by m. 

As is common in the literature on sponsored search we assume that agents only value clicks; they have 
no value for an impression when the ad is not clicked. For every ad j there is a value-per-click Vj such that 
the unique agent that is interested in that ad receives utility vj whenever this ad is clicked; this value is the 
agent's private information. 

A mechanism for this domain proceeds as follows. There are T rounds, where the time horizon T is 
fixed and known to everyone. In each round the mechanism either decides to skip this round or chooses one 
ad to display. Then the ad is either clicked or not clicked. All agents bid once, before the first round. The 
bid of a given agent consists of a tuple of reported values for his ads. The bid reported for ad j is denoted 
bj \ the entire bid vector of all agents for the m ads is denoted b = (b\ , ... , b m ). Payments are assigned 
after the last round. 

4 Note that the proof of cycle-monotonicity of A did not use any other property from Theorem |2.2| other than that it is bid- 
resampling. The tmthfulness of the single-parameter mechanisms (Ab, Vb) is used in the forthcoming argument about payments. 
5 Here and elsewhere, MAB stands for multi-armed bandits. 
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For each ad j, the click probability is fixed over time and denoted pj. In each round when this ad is 
displayed, it is clicked independently with probability pj. Click probabilities are called click-though rates 
(CTRs) in the industry. We assume that the CTRs are not known neither to the mechanism nor to the agents. 
For brevity, let p, = (pi , . . . , p m ) be the vector of all CTRs. 



Interpretation as a multi-parameter domain. For our setting, stochastic truthfulness (and similarly 
stochastic CMDN, etc.) is a property that holds in expectation over clicks, for all possible CTR vectors 

Following the prior work, the external seed is defined as click realization p, in the following sense. For 
every ad j and every round t, realization p(t, j) G {0, 1} says whether this ad would be clicked if it is shown 
in this round. In particular, ex-post truthfulness corresponds to truthfulness for every click realization. Note 
that a given run of a mechanism does not observe the entire click realization: it only observes clicks for ads 
that are displayed in a given round. 

For every bid vector b and each click realization p, let Cj(b, p) be the expected total number of clicks 
received by ad j, where the expectation is over the internal randomness in the mechanism. Denote C (6, p) = 
(C\(b,p) , ... ,C m (b,p)) and call it the click vector. We interpret the click vectors as the "outcomes" 
in the multi-parameter domain. Note that a given click vector C(b, p) corresponds to expected welfare 

Note that with this interpretation of the "outcomes", the allocation rule is not free to choose any well- 
defined outcome. Instead, the collection of outcomes that can be implemented on a given run of the mecha- 
nism is constrained by the click realization!! 

For a given CTR vector p, let C(b, p) = E pr ^C(&, p), where the expectation is taken over click realiza- 
tions p according to the corresponding CTRs. Call it the /i-click vector. In expectation over the clicks, the 
welfare is ^T,j v jCj(b, p). When considering stochastic truthfulness, it will be more convenient to re-define 
outcomes as /i-click vectors. 



Discussion and background. If not for the issue of incentives and the requirement of truthfulness, the 
welfare-maximization problem for the allocation rule is precisely the multi-armed bandit problem (hence- 
forth, MAB), a well-studied problem in Machine Learning and Operations Research. MAB mechanisms 
can be seen as a version of the MAB problem that incorporates incentives. MAB m echanisms (in the lim- 
ited single-parameter case, with one ad per agent), were introduced an d studied in feabaioff etall 120091 : 
Devanur and Kakadd . |2009) for the deterministic case. Subsequently, iBabaioff et al studied ran- 

domized MAB mechanism s. Below we recap some of the contributions made in "dBabaioff etall 120091 : 
Devanur and Kakadd . 12009) . 

MAB mechanisms were suggested as a simple model in which one can study the interplay between 
incentives and learning, two major issues that arise in pay-per-click auctions. Pay-per-click is (along with 
pay-per-impression) one of the two prevalent business models in the advertising on the Internet, and the 
prevalent pricing model in sponsored search. Compared to pay-per-impression, pay-per-click reduces the 
risk that advertisers take, as they only pay when the ad is clicked. The seller, who has some control over 
clicks, bears the risk instead. Moreover, advertisers typically have very little or no information about their 
CTRs, and should not be required to learn more. The pay-per-click model essentially shields the advertisers 
from this uncertainty. 



6 Alternatively, we could have defined "outcomes" via impressions rather than clicks. But then an agent would not have a 
full knowledge of his value for each outcome (his type) as the CTRs are not known to him. Such a definition necessitates some 
cumbersome modifications to the framework in Section|2] Both versions lead to the same results. 
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The crucial assumption in our model of MAB mechanisms is that the CTRs are initially not known to 
the mechanism. This assumption reflects the fact that the CTRs are learned over time, while the ads are 
being allocated, and so the process of learning should be treated as a part of the gamelzl 

The focus of the investigation in (|Babaioff et all 120091 : iDevanur and KakadeL l2009h was whether and 
how the requirement of truthfulness restricts the performance of MAB algorithms when types are single- 
parameter. They found a very severe restriction for deterministic, ex-post truthful mechanisms: the al- 
location rule can only have a very simple, "naive" structure (separating exploration and exploitation), 
which severely impacts performance compared to the best MAB algorithms. They capitalize on the "no- 
simulation" constraint to prove that if an allocation rule does not conform to this simple structure, then a 
truthful mechanism with this allocation rule cannot be information-feasible. 



The obstacle of information-feasibility for the single parameter case is circumvented in Ba baioff et al. 



(120 id) by moving from deterministic to randomized MAB mechanisms. The single-parameter transforma- 
tion (Theorem 12.21 ) reduces the design of truthful, information-feasible MAB mechanisms to the design of 
monotone allocation rules for this domain. Further, the authors provide monotone allocation rules whose 
performance matches that of optimal MAB algorithms. Specifically, they show that (a minor modification 
of) a standard MAB algorithm UCB1 (lAuer et all 120021) is stochastically monotone, and they design a new 
MAB algorithm which is ex-post monotone and has essentially the same performance. 



5 Ex-Post Truthful Multi-parameter MAB Mechanisms: 
An Impossibility Result 

In this section we present our second main contribution: a very strong impossibility result for ex-post truthful 
multi-parameter MAB mechanisms. Consider one of the agents and fix the bids of the others. Essentially, 
we show that an allocation rule which satisfies ex-post CM0N for that agent, cannot depend on the bid of that 
agent. More precisely, this holds for a deterministic allocation rule if the bids are large enough, as well as 
for any allocation rule (deterministic or randomized) that never skips a round. For randomized allocation 
rules that may skip a round, we show that if the allocation rule satisfies ex-post CMDN then it cannot achieve 
a nontrivial worst-case approximation ratio. 

Theorem 5.1. Let A be an allocation rule for multi-parameter MAB which satisfies ex-post CM0N. Fix any 
agent i, and fix bids submitted by all other agents. 

(a) If A is any allocation rule (deterministic or randomized) that never skips a round, and if agent i is the 
only agent, then the allocation has no dependence on his bids. 

(b) If A is deterministic, then there exists a finite B such that the allocation for agent i does not depend 
on his bids, as long as all his bids are larger than B. 

(c) If A is randomized, then its worst-case approximation ratio (over all bid vectors of agent i) is no 
better than that of the trivial randomized allocation rule that ignores agent i's bid, samples one of his 
ads uniformly at random, and allocates all impressions to that ad. 

The first conclusion presumes there is only a single agent, and to prove the remaining two conclusions 
it suffices to consider the case of a single agent, because from the perspective of any given agent the ads 

7 If some information on CTRs is known before the allocation starts, this can be modeled via Bayesian priors on CTRs. Follow- 
ing l lBabaioff et all 12009: Deva nur and Kak ade. 2009: lBabaioff et alll2010l) . we focus on the non-Bayesian version. 
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allocated to other agents can be represented as skips. (In particular, allowing skips in single-agent allocation 
rules is essential for the generalization to multiple agents.) In the rest of this section we assume a single 
agent with m ads. 

To prove our result we need to set up some notation. Recall that the bids of the agent are represented 
by a vector b = (b± , ... , b m ) G M+. For a given allocation rule A and a given click-realization p, the 
impression allocation A(b, t, p) G is a vector of probabilities, in expectation over the random seed of 
the algorithm, so that Ai(b,t,p) is the probability that ad i is chosen in round t given bid vector b and 
realization p. 



Weak monotonicity. We use CMON through a special case where k = 2 in Equation (§]); this special case 
is known in the literature as weak monotonicity, henceforth abbr eviated WMON . WMON is equivalent to CMON 
if there are finitely many outcomes and the type space is convex (|Saks and Yul. l2005h . It follows that in our 
setting, ex-po st WMON is equivalent to ex-pos t CMON for deterministic allocation rules. For more background 
on WMON, see (| Archer and KleinbergL l2008ah . 

Let us restate WMON in the notation of multi-parameter MAB mechanisms. Recall that the click vector 
C(b, p) is a vector such that Cj(b, p) is the total expected number of clicks for ad j, given bid vector b and 
realization p. Then 

Cj(M = ELi p(t,j)M b ^P) = EL &t(j>)A(b,t,p), 



where A t (p) is the m x m diagonal matrix with diagonal entries (p(t, 1) , 
states the following: for any realization p and any bid vectors b, b € R+, 

(b-b)-(C(b,p)-C(b,p))>0 

Re-writing this in terms of the impression allocation, we obtain: 

(b-b)^ EL Mp)(Ab,t,p)-A(b,t lP ))>0. 

Here and elsewhere, denotes a transpose of a matrix M. 



,p(t,m)). Ex-post WMON 



(8) 



Analysis for allocation rules with no skips (Theorem 15.1( a)). The first part of Theorem 15.11 follows 
directly from the following claim. 

Claim 5.2. Let Abe a single-agent allocation rule (possibly randomized) which satisfies ex-post WMON and 
never skips a round. Then for each click-realization p and each round t, A does not depend on the bid vector 
b. 

Proof. For the sake of contradiction, assume that A(b, t, p) ^ A(b', t, p) for some round t, click-realization 
p, and bid vectors b, b' € R+. Pick the smallest t for which such counterexample exists. Assume w.l.o.g. 
that p = for all rounds after t. For each ad i, let pi be a realization that coincides with p on all rounds but 
t, and in round t ad i is clicked and all other ads are not clicked. 

Let 6 = 1 + max(6, b') £ M.™, where max(6, b') is the coordinate-wise maximum of b and b'. Since 
A(b, t, p) 7^ A(b', t, p), we can w.l.o.g. assume that A(b, t, p) 7^ A(b, t, p). Since A never skips a round, 

TZi Mb, t, P ) = i = ET=i Mb, t, P ). (9) 

Combining A(b, t, p) 7^ A(b, t, p) with Equation (© we deduce that for some ad i, Ai(b~, t, p) < Ai(b, t, p). 
We claim that WMON is violated for bids b, b and realization pi. Indeed, consider Equation dU) for realization 
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Pi. The sum in Equation (HI) is for all rounds other than t because A(b, s, p) = A(b, s, p) for all rounds 
s < t (by minimality of t), and p-i = for all rounds s > t. For round t, the sum in Equation © is for all 
ads other than i, by definition of pi. Thus, the sum is simply equal to (6j — 6j) • [Ai(b, t, p) — Ai(b, t, p)], 
which is negative, contradicting Equation dSJ. □ 

Analysis for the deterministic case (Theorem |5.1f bV). We now address deterministic allocation rules 
that may skip rounds. The analysis of this case captures the main ideas of the randomized case while being 
significantly easier to present. 

We will use the following shorthand. For a vector b = (6i , ... ,b m ) G R+, denote min(6) = 
mini<j< m bi. For two vectors b, b G WP, let min(6, 6) G W? be the coordinate-wise minimum of b and 6. 
Use the same notation for max(6) and max(6, 6). Let 6^6 denote coordinate-wise ">", that is, bi > b\ for 
alH. 

Fix click-realization p and round t. Let A be a deterministic allocation rule for a single agent. If A skips 
round t, write A(b, t, p) = skip. 

One technicality in the analysis is handling skips; we deal with it using the following notions^ 

&min(i) P) = sup{max(6) : b G M.™ and A(b, t, p) = skip}. 

5 = max( {0} U {6 min (t, p) : 3 1, p such that 6 min (t, p) < oo} ). (10) 

Note that B = if b m ± n (t, p) = oo for all t and p. For a given round t and realization p, b min (t, p) is defined 
such that if all m bids are larger than b min (t, p) then the allocation does not skip at round t on realization p. 
B is defined such that for every realization and every round, if all bids are larger than B then the allocation 
rule never skips. 

Claim 5.3. Let Abe a deterministic single-agent allocation rule which satisfies ex-post WMON. Then for each 
click-realization p and each round t, A does not depend on the bid vector bfor all bid vectors b G (B, oo) m , 
where B is defined in Equation diOD . 

Proof. For the sake of contradiction, assume that A(b, t, p) / A(b', t, p) for some round t, click-realization 
p, and bid vectors b, b' G (B, oo) m . Pick the smallest t for which such counterexample exists. Assume 
w.l.o.g. that p = for all rounds after t. For each ad i, let pi be a realization such that it coincides with p on 
all rounds but t, and in round t ad i is clicked and all other ads are not clicked. 
Let us consider two cases, depending on whether b min (t, p) is finite. 

Case 1: b mill (t,p) = oo. At least one of A(b, t,p), A(b',t,p) is not equal to skip. Since A(b,t,p) ^ 
A(b',t,p), we can w.l.o.g. assume that A(b,t,p) / skip. Hence, Ai(b,t,p) = 1 for some ad i. Since 
bmin{t, p) = oo, there exists b G (max(6 ), oo) m such that A(b, t, p) = skip. 

We claim WMON is violated for bids b, b and realization pj. As in the first case, we see that the sum 
in Equation ([8]) is for all rounds other than t, and for round t the sum is for all ads other than i. Again, it 
follows that the sum is simply equal to bi — b\, which is negative, contradicting Equation (H). Claim proved. 

Case 2: b min (t, p) < oo. The proof of this case is very similar to the proof of Claim 15^21 

Recall that in case 1 it holds that b mill (t, p) < oo. Let 6=1 + max(6, 6') G R!f , where max(6, 6') is 
the coordinate-wise maximum of 6 and b' . Since 6 mln (t, p) < oo, it follows that B > 6 min (t, p), so neither 

8 We use a standard convention that sup(0) = — oo. 
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A(b,t,p) nor A(b',t,p) nor A(b,t,p) is equal to skip. Since A(b,t,p) 7^ A(b',t,p), we can w.l.o.g. 
assume that A(b\ t, p) 7^ A(b, t, p). In particular, Ai(b, t, p) = and «4j(&, i, p) = 1 for some ad i. 

We claim that WMON is violated for bids b, b and realization p^. Indeed, consider Equation © for real- 
ization pi. The sum in Equation ® is for all rounds other than t because A(b, s, p) = A(b, s, p) for all 
rounds s < t (by minimality of t), and pi = for all rounds s > t. For round t, the sum in Equation ® 
is for all ads other than i, by definition of pi. Thus, the sum is simply equal to 6j — hi, which is negative, 
contradicting Equation (HJ. Claim proved. □ 



5.1 Analysis of the randomized case: proof of Theorem I5.1tc) 

The proof of the randomized case of Theorem 15.11 is technically more involved than the proof of Theo- 
rem EUb)). In particular, even stating the analog of Claim I5T21 requires a considerable amount of setup. 

Define functions /, g, G : N x R + — > R + by the following recurrence: f(0, y) = g(0, y) = G(0, y) = 
for all y; while for t > 0: 

f(t,y) = 3ymG(t-l,y) + l 

g(t, y) = 2 f(t, y) + 2 + 3ym G{t -l,y) 

G(t, y) = g(t, y) + G(t -l,y) = £* s=1 g(s, y). 

For real numbers B > 0,y > 1, let V(B, y) denote the set 

V(B,y) = {b : min(6) > B, max(6)/min(6) < y}. 

We will refer to a bid vector as "y-balanced" if it satisfies max(6)/ min(6) < y. 

Let A be a (potentially randomized) allocation rule. Fix realization p. For all times t and all e > 0, y > 1 

let 

A max (t,p,y) = lim sup { t,p) || a : beV(x,y)} 

b max (t,p,e,y) = sup{x : 3b G £>(x,y) ||„4(M,p)lli < A mSLX (t,p,y) - ef(t,y) } 

^ s y \ _ |° if fe max(i, P, e,y) = 00 for all t, p, e 

|sup (M n {6 max (t, p,e,y) : t G N, e > 0}) otherwise. 

Here ^4 ma x(t, /?, y) is the maximal expected number of impressions at time t such that the agent can obtain 
this number with arbitrarily large y-balanced bids. The meaning of b max is as follows: if every component 
of a y-balanced vector b is above 6 max , the expected number of impressions for time t is guaranteed to be 
within e f(t, y) of the best possible. Note that B e (y) = if 6 ma x(^ P, e> y) is infinite for all t and all p. 

Claim 5.4. Let Abe a single-agent allocation rule which satisfies ex-post WMON. Then for any y > 1, any 
e > 0, any bid vectors b, b' G T>(B e (y),y), any realization p, and any round t, we have 

\\A(b,t,p)-A{b',t,p)\\ 1 < eg(t,y) 

Proof. Fix e > and realization p. Let us use induction on t. Case t = is trivial, interpreting A(b, 0, p) = 
for all p, b. Now assume the claim is true for all times s < t. For the sake of contradiction, assume the 
claim does not hold for time t and some realization p. 
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By definition of A max , there exists a number M* such that 

sup ||.4(f>, t,p) || 1 < A max (t,p,y) + e. 

beV(M*,y) 

For each ad i, define a new realization pi as follows: it coincides with p before time t, only i gets clicked at 
time t, and there are no clicks after t. 

Fix bid vectors 6, b' 6 V(B e (y),y). Pick some bid vector 6 £ T>(M, y), where 

M = max(M*, 3y ||b + 

Let M = max(6). 

WMON for realization pi, applied to bid vectors b and 6, states the following: 

(b-b^ A t (pi) (A(b,t,p)-A(b,t,p)} (11) 

+ (b - 6)t ^ A,(p) s, p) - 4(6, s, p)) > 0. (12) 

s=l 

The first summand in Equation (fTTT i is simply (6j — 6j) ^ (6, t, p) — Ai(b, t, p) 
By the induction hypothesis, for each time s < t it holds that 

(6 — 6)+ A s (p) (A(b,8,p)-A(b,s,p)) < Meg(s,y) 

It follows that 

t-i t-i 
(6 - 6 )t £ A s (p) (A(b, s, p) - A(b, s, p)) < M e £ <?( S , y) = M e G(t — 1, y) 

Plugging this into Equation (fTTT) . we obtain 

(6i-6i) ( A(M,p) - Ai{b,t,p) ) >-MeG(i-l,y) 
A(M,p) - A(M,p) > -(6i-6i)- 1 MeG'(t-l,y). 

We have 6; > M/y since 6 is y-balanced. Also b{ < M/ (3y) by our choice of M. Therefore b% — bi> ^ 
and 

Ai(b,t,p)-Ai(b,t,p) > -f eG(t-l,y). (13) 

Case 1: b max (t, p, e, y) < oo. Denote Xi = Ai(b, t, p) and X[ = Ai(b',t, p). In this notation, our goal is 
to bound ||X — X' || 1 above by e g(t, y). Assume I > = M 1 for some M > M. By Equation (fT3l) . noting that 
this argument applies to both 6 and b' , we have: 

t, P) > max(Xi,Xi) -^ e G(t- l,y). 

Summing this over all ads: 

% 



A(b,t,p) i > Umax^X')^ - f emG(t-l,y). 
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Recall that A(b, t, p) < A max (t, p, y) + e by our choice of M. Therefore: 

||maxpr,X')||i < A max (t,p,y) + e (1 + f mG(t - l,y)). 

Note that 

\\X\h + H^'IIj = llmax^X')^ + ||min(.Y,X')||i 
\\X - X'^ = llmax^X'))^ - ||min(X, X')^ 
\\XWi + || JSC'H x + \\X - X'Hj = 2 ||max(X, A"')^ 

Because b max (t, p,e,y) < oo and 6,6' G V(B £ (y),y), both ((X^ and HJf'lli are at least AnaxO, P,y) 
ef(t,y). Therefore: 



2A max {t,p)-2ef(t,y) + \\X-X'\\ 1 < 2 ||max(X, X')^ 

£ Cl _L _ 

It follows that 



< 2A max (t, p, y) + 2 e (1 + & G(t - 1, y)). 



\\X - X\ < 2e ( 1 + /(i, y ) + ^mG{t- l,y) j =eg(t,y). 
Thus, we have proved the induction step assuming b max (t, p, e, y) is finite. 

Case 2: 6 max (i, p, e, y) = oo. This case is impossible: we will arrive at a contradiction. 
By definition of A max , there exists a bid vector b £ V(B e (y),y) such that 

P(M,p)Hi > 4nax(*,P,y) ~^ef(t,y). 
Since 6 max (i, p, e, y) = oo, we can pick b € V(M, y) such that 

\A(b,t,p)j i <A max (t,p,y)-ef(t,y) < ||.A(M,p)|li - \ ef(t,y). 

It follows that 

m 

[Ai(b,t,p)-A t {b,t,p)^ >\ef{t,y) 

3i A l (b,t,p)-A l (b,t,p)>^ef(t,y). 
Using Equation (fT3l) . for this z we have: 

2^ e f(t, y) < Mb, t, p) - Mb, t,p)<%eG(t-l,y). 
Thus, f(t, y) < 3ym G(t — 1, y), contradicting the definition of /. □ 
Using Claim |531 it is now easy to prove Theorem 15. If c). 
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of Theorem 15. 1 X c ). For any 5 > 0, let 

y = 2m/5 
e = 6 

2mg{T,y) 

B = B t (y). 

In our proof we will considering applying A to the bid vector b° = Bl as well as the vectors b> defined for 
j = 1, . . . , m by changing the j th of b° from B to yB. The vectors 6°, . . . , b m all belong to V(B, y). 

Let p be a realization such that p(t,j) = 1 for all i.e. every ad is always clicked. Since A can 
never allocate more than T impressions, we have J2t=i •Ai(p°, t, p) < T. Hence, there is at least one 
j G [m] such that 

T 

Y J Mb°^p)<T/m. (14) 
t=i 

Now, for every round t, we have 

A j (V,t,p)-A j (b°,t,p) < \\A(V,t,p)-A{b°,t,p)\\ 1 <eg(t,y) = (15) 

where the second inequality follows from Claim 15.41 Summing Equation (fT3T > over t = 1, ... ,T and 
combining with Equation (fT4l) . we deduce that 

EA(^*>p)<(i|)£ 

The optimal allocation forbid vector fr 7 assigns every impression to ad j, achieving a total value of yBT. In- 
stead, the allocation computed by A achieves a total value bounded above by (l + |) + BT, where the 
first term accounts for impressions allocated to ad j and the second term accounts for all other impressions. 
We have 

(l + l)^ + BT=^.(l + l + f)=yBT.^. 

Since 5 > was an arbitrarily small positive constant, we conclude that the worst-case approximation ratio 
of A is no better than 1/m, which is trivially achieved by a random allocation. □ 



6 A Stochastic CMON Allocation Rule 
for multi-parameter MAB mechanisms 

In this section we consider the problem of designing stochastically truthful multi-parameter MAB mech- 
anisms. As discussed in the introduction, the VCG mechanism cannot be used as it is informationally- 
infeasible. Additionally, pricing based mechanisms do not seem to be feasible. The only other technique 
that is extensively exploited in the literature for multi-parameter domains is using maximal in distributional 
range (MIDR) allocation rules. We formalize the limitations of a natural family of MIDR allocation rules (in 
which the set of distributions the rule optimizes over is independent of the CTRs) in Section [631 showing 
that the performance of such rules is no better than randomly selecting an ad to present. We next discuss 
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some simple approaches to create truthful mechanisms: the first disregards the bids, and the second uses 
randomization to reduce the problem to a single parameter problem. 

The first approach is bid-independent allocation rules - ones that do not depend on the bids. Among 
those, we naturally focus on the allocation rule that achieves the best worst-case performance, that rule 
samples an ad independently and uniformly at random in each round; call it RND. 

A slightly more sophisticated approach randomly reduces the problem to a single parameter problem as 
follows. One ad is selected independently for each agent, uniformly at random from this agent's ads. Then 
some truthful single-parameter mechanism M. is run on the selected ads. Call this mechanism SubSample. 
This mechanism is truthful (ex-post or stochastically, same as M.) because for each realization of the selec- 
tion described above, it is simply a truthful single-parameter mechanism. The performance of this mecha- 
nism is the same as the performance of the trivial RND mechanism when there is only one agent. 

These two naive approaches have poor performance. For example, for a single agent none performs 
better than uniformly randomizing over the ads. We call such a performance trivial. This gives rise to the 
following major open problem. 

Open Problem: Design a stochastically truthful mechanism for the multi-parameter MAB problem that 
achieves optimal approximation. 

A more modest goal is to design a stochastically truthful mechanism for the multi-parameter MAB prob- 
lem that achieves non-trivial performance, even for some "well-behaved" subset of inputs. Unfortunately, 
it seems that all standard tools fail to achieve even this modest goal. Below we achieve this by designing 
a stochastically CMDN allocation rule and then applying the multi-parameter transformation from Section [3] 
We interpret this result as an evidence that it is not completely hopeless to significantly improve over the 
trivial approaches. 

6.1 The stochastically CMON allocation rule 

We design a stochastically CMON allocation rule ALL whose expected welfare exceeds that of RND on all prob- 
lem instances with at least two agents, and that of SubSample on an important family of problem instances 
which we characterize below. Structurally ALL depends on all submitted bids, is provably not MIDR, and, 
unlike SubSample, does not proceed through an explicit reduction to a single-parameter allocation rule. Im- 
plementing ALL as a truthful, information-feasible mechanism requires the full power of our multi-parameter 
transformation. 

All results in this section require all private values to be bounded from above by 1. We will assume that 
without further notice. 

Recap of notation. The term "expected welfare" refers to expectation over the randomness in the alloca- 
tion rule and the clicks (for a given vector of CTRs). Let VF(RND) denote the expected welfare of RND. Let 
Aq = {1 , ... , to} be the set of m ads of all agents. Recall that Vj, bj and fj,j be, resp., denote the private 
value, the submitted bid, and the CTR for ad j. Note that the expected value from each time a given ad j is 
displayed is Vjfij. 

Allocation rule ALL for > 2 agents. Assume there are at least two agents. Define the following allocation 
rule, call it ALL. It consists of two phases: exploration and exploitation. Exploration lasts for To rounds, 
where To > 1 is fixed and chosen in advance. In each exploration round an ad is chosen uniformly at random 
among all ads. Let nj be the number of clicks for ad j by the end of the exploration phase. In each round of 
exploitation ALL does the following: 
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(LI) pick each ad j with probability bj hj/Tq, where bj is the bid for ad j. 
(L2) with the remaining probability pick an ad uniformly at random. 

This completes the specification of ALL. We note that even a single round of exploration suffices for our 
puiposes. Using a small To does not affect the expected performance, but results in a (very) high variance. 

Discussion. We design ALL to ensure that the allocation probabilities depend on CTRs and bids in a simple, 
linear way. Below we explain why this "linear dependence" property is useful, and discuss some of the 
challenges in the analysis of ALL. 

Let the allocation-vector be a vector a G K m whose j-th component is the expected number of times ad 
j is allocated by ALL. For a given vector of CTRs, the allocation-range is the set of all allocation-vectors 
that can be realized by ALL. We conjecture that the allocation-range needs to depend on CTRs in order for 
an allocation rule to satisfy stochastic CMON and be, in some sense, non-trivial. (In Section 16.31 we prove a 
version of this conjecture that is restricted to stochastically MIDR allocation rules.) The "linear dependence" 
property of ALL ensures that the allocation-range does depend on CTRs. 

For example, consider an allocation rule which has an exploration phase of fixed duration, picks the 
best (estimated) ad based on the clicks received so far, and sticks with this ad from then on. This allocation 
rule that is ex-post truthful in the single-parameter setting, and is perhaps the most natural candidate for a 
reasonable, easy-to-analyze allocation rule for our setting. However, the allocation-range of this allocation 
rule does not depend on CTRs (because the set of possible options for exploitation is fixed: any one ad can 
be chosen). 

Further, the proof technique that we use in the analysis of ALL essentially requires us, for every given 
agent, to solve a system of equations where the unknowns are this agent's bids and the parameters are the 
CTRs and the components of the allocation vector. The allocation probabilities in ALL are explicitly defined 
in terms of bids in order to enable us to solve this system of equations in a desirable way; this is another 
place where the "linear dependence" property of ALL is helpful. 

The subtle point in our analysis of ALL - or, it seems, in any analysis using the same proof technique - 
is that one needs to ensure that the allocation vector is a maximizer of a certain expression, which requires 
us to prove the positive-definiteness of the corresponding Hessian matrix. The "linear dependence" property 
of ALL enables us to argue about the Hessian matrix in a useful way. 

As we discovered, the positive-definiteness of the Hessian should not be taken for granted: indeed, it 
fails for a number of otherwise promising allocation rules with better performance. We believe that further 
progress on stochastically CMON allocation rules would require a more systematic understanding of how 
changes in the allocation rule propagate through the analysis and affect the Hessian matrix. 

Guarantees for ALL for > 2 agents. A problem instance is called uniform if the product Vjfij is the same 
for all j, and non-uniform otherwise. Note that for uniform problem instances RND is optimal, and in fact all 
allocation rules without skips have the same expected welfare, and are all optimal. We will assume that all 
values-per-click are at most 1, and that all CTRs are strictly positive. 

Note that instances on which RND performs very poorly are those where for one ad j the product VjfXj is 
large while for all other ads this product is very low. On the other hand, for such inputs ALL plays the best 
ad significantly more often. 

We next present a parameter that aims to quantify the divergence of the instance from uniform and will 
be used to measure the performance of ALL. A problem instance is called cr-skewed, for some a G [1, m], if 
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it satisfies 



(M 2 ) 2 > a(M 1 ) 2 , where M t 



(16) 



Note that problem instances can be cr-skewed for any given a G [1, m]. It is 1-skewed for uniform problem 
instances, and m-skewed when only one ad is good while all other ads have value 0. 

Let Wo (ALL) be the expected per-round welfare for the exploitation phase of ALL, and let Wo(RND) be 
the expected per-round welfare for RND. Note that Wb(RND) = M\. The properties of ALL with at least 
two agents are captured by the next lemma (which is the main technical lemma in this section); its proof is 
deferred to Appendix 16.21 

Lemma 6.1. With at least two agents, allocation rule ALL satisfies the following: 

(a) If the CTRsfor all ads are strictly positive then ALL satisfies stochastic CMON. 

(b) For Wq (ALL) and Wq(RND) as defined above it holds that 



In particular, W(ALL) > W (RND) for all non-uniform problem instances. 

The allocation rule ALL does not have the property that scaling all bids by a common factor scales the 
expected welfare by the same factor; therefore it is not MIDR (see Section [631 for the definition of MIDR, as 
it applies to our setting). 

Reduction to the single-agent case. For a single agent, we define our allocation rule ALL as follows: we 
simulate a run of ALL with a single round of exploration and two agents, where the second agent is a dummy 
agent with a single ad. The dummy agent submits a bid of zero for his ad, and we fix its CTR to \ (any CTR 
works). This completes the specification of ALL. 

Denote the resulting two-agent allocation rule by ALL* . The single-agent allocation rule satisfies CMON 
because so does ALL*. Since the dummy agent does not contribute welfare (because of the zero bid), we 
have Wq (ALL) = Wo (ALL*). Applying Lemma loTTl a) to ALL*, we see that 



We summarize the useful properties of ALL in the following lemma: 

Lemma 6.2. Consider the case of a single agent; assume fij > Ofor all ads j. Then ALL satisfies stochastic 
CMON, and its welfare in exploitation rounds satisfies Equation (11 71 ). In the one exploration round, ALL 
obtains welfare ^jW (RND). 

Main provable guarantee. Let Ms be the mechanism obtained by applying Theorem 13.11 to ALL with 
parameter S G (0, 1). The main result of this section follows. 

Theorem 6.3. Consider a multi-parameter MAB domain with Vj < 1 and fij > for every ad j. Then 
mechanism Ms is stochastically truthful, for every 5 G (0, 1). 

Consider o~- skewed problem instances, and assume maxj gy 4 Vjfij > e > 0. There exists 5 G (0, 1) such 
that mechanism Ms satisfies the following: 




Wo (ALL) = Ml + (M 2 *) 2 - (Ml) 2 , where M\ 



7 = G£r 



(17) 
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(a) W(A4) > W(RND) on all problem instances with at least two agents, as long as a > 1. 

(b) W(M) > W(RND) = W(SubSample) on all problem instances with a single agent with m ads, as 
longasa>l + ^ + ^2_. 

(c) Suppose there exists an agent with k > m/2 ads; w.l.o.g. assume this is agent 1. Then W{M) > 

'ju- 
ke 



W(SubSample) on all problem instances such that a > 1 + "^T ^ when for all agents i > 1 all 



private values are 

The theorem follows from Lemma loTTl Lemma Irx2l and Theorem 13. II via straightforward computations, 
some of which we omit from this version. Recall that for each 5 > we have W(A4$) > (1 — 5) W(kLL). 

Theorem \6.3\ a). Assume M2 > (l + e)Mi and maxj^A bjfij > e for some e > 0. Then, using the notation 
of Lemma loTTT b). we have Mi > e/m, and therefore 

Wo(ALL) - Wo(RND) > Ml ((1 + e) 2 - 1) > Mi ^. 

Recall that To is the duration of exploration in ALL, and T is the time horizon. Then: 

W(RND) = T Wo (RND) = TMi 

W(ALL) = T Wo (RND) + (T — T ) W (ALL) 

= W(RND) + (r - T ) (Wo(ALL) - W (RND)) 

> W(RW) + 7 W(RND), where 7 = ^Zt^ 
W{M) > (1 - r]) W(ALL) > (1 - 77)(1 + 7) W(RND). 

Thus, to ensure that W(M) > W(RND), it suffices to take rj < 1 - j^. □ 

Proof Sketch ofTheorem\6j)ibc). For part (b), recall that Wo(RND) = W^SubSample) = ^ ^jli b j H- 
With a simple computation which we omit from this version, one derives that W(„4) > ^(RND). We prove 
W(Ai) > W(RW) using a computation similar to the one in the proof of part (a), we omit the details. 

For part (c), note that Wo (RND) = Mi and (under the assumptions in Theorem l6.31 c)). W(SubSample) < 
^Mi. Again, using a simple computation one can show that W(^4) > W(SubSample), and then pick a 
sufficiently small 5 as in the proof of part (a). □ 

6.2 Proof of the main technical lemma (Lemma 16.11) 

Let us set up some notation. Consider an exploitation round in the execution of ALL. For each ad j, let Ej 
be the event that ad j is chosen in line (LI) of the algorithm's specification. Let E u be the remaining event 
in line (L2) when the ad is chosen uniformly at random. Denote Xj = Pv[Ej], and note that for each ad j, 

x j ±PT[E j ]=b j E[n j ]/T = ±b jH . 



'One can also derive a version of this result where the private values for all agents i > 1 are smaller than 5, for some 5 e. 
We omit the easy details. 

10 Note that the instances considered in this result are generalizing the instances we have discussed before. There are instances in 
which one agent have all but one ad, and only one of his ads has positive value, while all the rest of the ads (his and others) have 
value 0. 
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of Lemma \6.1\ b). Consider a round in the exploitation phase of ALL. Partition this round into events V = 
{Ei , . . . , E m ; E u }. For each event E G V in this partition, let Wq(E) be the expected per-round welfare 
of ALL from this event, so that W (ALL) = YIecv w o( E )- Note that W (E U ) = Pr[£ u ] Wo(RND). Further, 
Wo(Ej) = bjfij Pr[Ej] = mx 2 for each ad i. 

It is easy to see that W (RND) = ^ £\ bjfij = £\ xj. It follows that 

VFo(ALL) - Wo(RND) = YjE^P Wq(E) - Pt[E] W {BNU) 

= E j W (E J )-E J Pr[E j ]W (MB) 



111 ^.i- r .i) ~ (52j x J 



2 



= Ml - M\. □ 

For Lemma loTTT a). we rely on the following characterization of CMON from prior work: 

Lemma 6.4. Consider a function f : S — > 5ft fc , where S C 5ft fc . Let f(S) C 5ft fc be the image of f. Then f 
is CMON if and only if it is an affine maximizer, i.e. 

f(x) = argmax [x ■ y — g(y)] for some function g : f(S) — > 3ft. 

yef(S) 

Proof of Lemma WT\ a ). Assume that there are at least two agents, and all CTRs are strictly positive. Without 
loss of generality, let us focus on agent 1. We will use the following notation. Let A = {1 , ... , k} be the 
set of ads submitted by agent 1. Here k is the number of ads submitted by agent 1; note that k < m. Let 
6 = (&i, ... , 6^) be the vector of bids for agent 1, where bj is the bid on ad j. Let B = [0, l] k be the set of 
all possible bid vectors for agent 1. Let fi = (/ii , ... , /i^) be the vector of CTRs for agent 1. We will use 
both i and j to index ads. 

Throughout the proof, let us keep the bids of all other agents fixed. Let Cj,t(6) be the expected number 
of clicks that ad i receives in round t of ALL, given the bid vector b, where the expectation is taken over all 
realizations of the clicks and over the randomness in the algorithm] n l 

Let Ct(b) = (Cij(b) , ... ,Ck,t{b)) be the round-t vector over the ads of agent 1, and let C{b) = 
J2t Ci,t(b) be the vector whose i-th component is the total expected number of clicks for ad i. 

We need to prove that the function C : B — >■ 5i fc satisfies CMON. It suffices to prove that CMON is satisfied 
for each round t separately, i.e. that it is satisfied for each function Ct- This is obvious if t is an exploration 
round. In the rest of the proof we fix t to be an exploitation round. 

By Lemma [6~4l it suffices to prove that Ct(b) is an affine maximizer, i.e. that 

C t (b) = argmax ^hpi - G(p, ff) (18) 

pdC t {B) jeA 

for some function G(p, n) : C t (B) x [0, l] k — > 5ft, where C t (B) C [0, l] k is the image of C t - Crucially, the 
function G cannot depend on b. o 

Denote p* = Ct(b). If p* is an interior point of Ct(B) and function G is differentiable, then Equa- 
tion (fT8T ) implies the following: 

d 

— — G(p* , fi) = bi for each ad i, bid vector b and CTR vector fi. (19) 

dpi 



"Here it is more convenient to use a slightly different notation for click-vectors, compared to Sectionf4] 

12 Note that G can depend on the CTRs, even though the mechanism does not know them. This is because G is only used for the 
analysis - to prove CMON, and it is not actually used in the mechanism 
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We will construct a function G(p, fj) so that it satisfies Equation ( fl9l ). 

Here and on, i g i denotes an arbitrary ad of agent 1. Recall that X{ = Pr[£y = ^hfii. Thus: 

Pr[E u ] = 1 - Z je A Pr[Ej] = 1 - £ i6 A„ x, 

C i>t (b) = m (Pr[£4 + i Pr[£ u ]) = W (a* + i - i E, 6 Ao *i) ■ 

Recalling the notation p* = Ct(b) and solving for xi, we obtain 

Pi / Hi = x i + m ~ m 12jeAo X 3 
^j£AP*jl Hj = SjeA x j + m ~~ m SjeA X J 

= (& - Y) + (1 - &) where y = ^A \A x r 

Pi/ Hi = Xi-a V ; . A p* //, + /?. 

where a = and /3 = — - a(Y - It follows that 

6, = — Xi = Pi -o + }v j • (20) 

/'< I'J f^ A Hi 

Denote the RHS of Equation d20l) by fi(p*,fJ-). We have proved that 6/ = fi(p*,fJ>) for each ad i. Thus to 
obtain Equation (fl9l ) it suffices to pick //) so that it satisfies 

— G(p, /u) = /i(p, //) for each i G A. (21) 

Integrating /j(p*, /i) over pj, for each ad i, and combining the resulting expressions, we obtain 

^ to/3 m ^ 2 1 + a ^ ma 

It is easy to check that this G satisfies Equation (l2Tb . which in turn implies Equation (fl9T l. [^1 It follows 
that for this G, p = p* is a critical point in Equation (TT8T ). From here on we will use the G as denned 
in Equation (1221 . 

We claim that the critical point p = p* is in fact a local maximum in Equation (fT8T ). Equivalently, we 
claim that p = p* is a local minimum of the function 

\(p) = G(p, fi) - p ■ b : R k ^ ^. 

For that, it suffices to prove that the Hessian matrix H of A(-), defined by 

d d 
dpi dpj dpi dpj 

is positive-definite for p = p* . Note that for any p G K fc it holds that 



13 Write /i(p, = 0s + X]jG4Pj7'i for some numbers cf>i and7ij. Then a function G(p, fi) satisfying Equation d2 1 b exists if 
and only if 7^ = 7^ for all i =/= j. 
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where pi = for each i £ A, and r = = I + m — k > 2. By Claim 16.51 such matrix is 

positive-definite. 

To complete the proof, we will show that p = p* is the global maximum in Equation (fT8l) over all 
p G R k . For that, it suffices to prove that that p = p* is the unique critical point over the entire R k , i.e. the 
unique solution for the system 

d 

-r — G(p, p) = hi for each ad i G A. (24) 

opt 

Let us re-write this system using Equation (EH . (We find it convenient to use the notation r and p i; as 
in Equation (|23T>.) Namely, for each iGAwe have: 

= Pi ( T P?) +Ej & A\{i}Pj (PiPj)- 
It follows that the system in Equation (1241 is equivalent to 

H ■ p = w, 

where the k x k matrix H is defined by Equation (123T ). and the vector to € is defined by Wi = 6j + — 
for all i The matrix # is non-singular (since it is positive-definite), so the system H ■ p = w has a unique 
solution p. □ 

Claim 6.5. Consider a k x k matrix H given by Equation (12 3i , where p\ , ... , p\. are arbitrary positive 
numbers. Assume r > 1. 77je?i z's positive definite. 

Proof. We will use the Gram matrix characterization of positive-definite matrices. Namely, to prove that H 
is positive-definite, it suffices to construct finite-dimensional vectors w\ , ... ,Wk such that Hij = Wi ■ Wj 
for all i,j and the vectors are linearly independent. Consider vectors w\ , ... ,Wk € 3ft fc+1 defined as 
follows: 

y/r — 1 p i , I = i, £ < k 
Wi(l) = < 0, l^i, £<k 

>Pi, £ = k + l, 

It is easy to see that these vectors satisfy the desired properties. □ 



6.3 An impossibility result for stochastically MIDR allocation rules 

Let us consider stochastically MIDR allocation rules for multi-parameter MAB mechanisms. We show that 
any such allocation rule (with a significant but reasonable restriction) is essentially trivial. 

Let us formulate what it means for a given allocation rule A to be stochastically MIDR in our setting, in 
a specific way that is convenient for us to work with. For a given bid vector b € (0, oo) m , and CTR vector 
p € [0, l] m , let the allocation-vector be a vector a = {a\ , ... , a m ) such that aj is the expected number of 
times ad j G [m] is allocated by A. Note that the expected welfare corresponding to a given allocation vector 
a is simply ^ ■ ajbjPj. Let = {a E [0, T] m : ^ a,j < T} be the set of all feasible allocation-vectors. 
(The sum of the entries can be less than T because skips are allowed.) Then A is stochastically MIDR if and 
only if for all bid vectors b and all CTR vectors p it holds that 

W(A(b)) = max V- ajbjpj (25) 
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for some F C Fq that does not depend on b, but can depend on p. 

Note that Equation (|25T ) does not immediately provide a stochastically truthful mechanism via VCG 
payments, because the computation of VCG payments is not immediately feasible without knowing the 
CTRs. In fact, Equation (|25T ) does not even provide an immediate way to compute the allocation (assuming 
\F\ > 2), again because of the issue of not knowing the CTRs. This is in stark contrast with the prior 
work on MIDR (which studied settings without the "no-simulation" constraint) where the MIDR property 
immediately gave rise to a truthful mechanism via the VCG payment rule. 

However, if an allocation rule satisfies Equation (|25T ) th en a truthful mechanism c an be obtained, with 



an arbitrarily small loss in welfare, via the transformation in lWilkens and Sivanl (120121) . 

We consider a restricted version of Equation (1251 where the range F cannot depend on the CTRs (we 
will call such range F CTR-independent). We prove that any such allocation rule is welfare-equivalent to a 
time-invariant allocation rule. Here an allocation rule is called time-invariant if in each round, it picks an 
ad independently from the same distribution over ads (this distribution may depend on the bids). Note that 
time-invariant allocation rules ignore the feedback that they receive (i.e., the clicks), and thus cannot adjust 
to the CTRs. 

Lemma 6.6. Consider a multi-parameter MAB domain. Let A be a stochastically MIDR allocation rule 
with CTR-invariant range. For each bid vector b there exists an allocation-vector a = a(b) G Fq such that 
W(A(b)) = CLjbjfij for all CTR vectors p. So A is welfare-equivalent to a time-invariant allocation 
rule ( where, letting T be the time horizon, each ad j is chosen with probability aj (b) /T ). 

Corollary 6.7. In the setting of Lemma \6.6\ the approximation ratio of A ( compared to the welfare of the 
best ad) is at least m on some problem instances. 



Proof of Lemma \6. 61 Let us fix the bid vector b and consider both sides of Equation (125l l as functions of 
First, we note that the expected welfare W(A(b)) is a finite-degree polynomial in variables p\ , ... , p m . O 
This is because, letting Aj(b, p, t) be the probability that ad j is displayed at round j given click-realization 
p, it holds that 

W(A(b)) = £ p Pr[p] E 3 ,tP(t,j)A,(b,p,t). (26) 

Here the outer sum is over all click-realizations p, and the inner sum is over all rounds t and all ads j. Pr[p] 
is the probability that p is realized for the given CTR vector. Equation (l26l is a polynomial in the CTRs 
because for each click-realization p, the inner sum is a fixed number, and Pr[p] is a polynomial in the CTRs 
of degree T. 

Let us re-write Equation (l25l as follows: 



W(A(b)) = max /3 ■ p, where F b = {(3 G K m : fa = ajbj for each j, a G F}. (27) 

Since F b is fixed, the right-hand side of Equation (l27l is uniquely determined by p, denote it W(p). 
Note that for each /3 G F b , it holds that 

W{p) = p ■ p if and only if (J3 - 0) ■ p > for all G F b . 

For each (3 G consider the half-space Hp = {pe [0, l] m : f3 ■ p, > 0}. Then 

W(p) = f3 • p, if and only if p G Sp, where Sp = P| Hp_pi. 



14 Namely, the degree is at most T, the time horizon. 
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Note that Sp is a convex set, as an intersection of convex sets. Moreover, all half-spaces in the intersection 
contain the 0-vector, and hence so does Sp. Therefore if W{p) = /3// for some /i / and j3 G Fb then by 
convexity for any z G [0, 1] it holds that zfi G Sp, and therefore W(zfi) = z{@ ■ /x) = zW(/j,). We have 
proved the following: 

W(zn) = z W(p) for every z G [0, 1] and /i G [0, l] m . (28) 

Now recall that W(/j,) is a finite-degree polynomial in \±. A known fact about multi-variate polynomials 
is that any finite-degree polynomial in p, which satisfies Equation (l28l is in fact of the form W{p) = 7 ■ p 
for some 7 G 5i m . 

Now, let A = {j : bj > 0} be the set of ads with non-zero bids. Define a vector a G K m by aj = ^j/bj 
for each ad j G A, and aj = otherwise. To complete the proof, it remains to show that, letting T be the 
time horizon, a/T is a valid distribution over the ads (assuming skips are allowed). That is, we need to show 
that aj > and Y^jeA a j — ^ ■ ^ e use tne ^ act tnat f° r an y allocation rule, the expected welfare is at least 
and at most that of always playing the best ad: 

w (v) = EjeA a 3 h 3^3 G [°> T max i ( 29 ) 

Applying Equation d29l ) with fi being the unit vector in the direction j G A, it follows that W{fi) = ajbj > 0, 
so aj > 0. Now, let B = (maxj £j 4 bj 1 )" 1 and define a CTR vector 11 by fj,j = B/bj for j G A and /ij = 
otherwise J^l Plugging this fi into Equation (f29T >. we obtain W(fi) = J2j£A-^ a j — BT, which implies 
Ylj^A a j — T, completing the proof. □ 
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